Impact of SEC the decision on Bitcoin transactions network¶

The US financial markets watchdog, the SEC, gave the go-ahead on Wednesday for the listing of a new bitcoin investment product, a decision seen as a major step towards the adoption of cryptocurrencies, which could shake up the sector. The aim of our work is to analyze the impact of the Security Exchange Comission's 10/10/2024 announcements (SEC decision) on bitcoin transactions.

The SEC, the US financial markets watchdog, has given the go-ahead for the listing of an ETF (exchange traded fund), an index fund that enables investors to profit from bitcoin's developments without investing their money directly in the digital currency. In concrete terms, investors buy shares in the fund, which they can sell at any time, rather than bitcoin. The fund's assets are invested in cryptocurrencies.

The market had already reached a milestone, in October 2021, with the listing of the first ETF invested not directly in bitcoin but in futures contracts linked to the cryptocurrency. Until now, access to digital currencies required opening an account on a cryptocurrency exchange platform and converting a traditional currency (issued by a central bank), such as the dollar.

By accepting the application of 11 investment companies, including BlackRock, the SEC has authorised a new bitcoin investment product. This is a major step forward for the adoption of cryptocurrencies, which could revolutionise the industry. This approval could "attract a significant amount of capital to the sector and is expected to have a potentially massive impact on valuations across the entire cryptocurrency sector", Swissquote analyst Ipek Ozkardeskaya.

Many investors had anticipated this authorisation. Over the last twelve months, the price of the leading cryptocurrency by capitalisation has soared by more than 150%. Bitcoin is now worth more than 45,000 dollars. In an uncertain geopolitical climate, some investors are also tending to turn to cryptoassets - which are reputed to be volatile - as a safe haven, BlackRock CEO Larry Fink recently said.

Import libs¶

In [1]:
#!pip install nxviz
In [2]:
import os
import pandas as pd
import numpy as np

### Ondine path ###
# path = os.chdir('/Users/ondineji/Library/Mobile Documents/com~apple~CloudDocs/Documents/ACADEMIQUE/M2 FTD/S2/Network Analysis in Python /projet/data/')

### Hugo path ###
path = os.chdir('C:/Users/humic/OneDrive/Documents/Ecole/SorbonneFTD/Cours/Data_Mining_Network/project/data/')
pd.set_option('display.max_columns', None)

import warnings
warnings.filterwarnings("ignore")

# import data viz libs
import matplotlib.pyplot as plt
from matplotlib import cm

# Import graph libs
import networkx as nx
from nxviz import CircosPlot, ArcPlot
from cdlib import algorithms

# import custom functions

from utils import *
Note: to be able to use all crisp methods, you need to install some additional packages:  {'infomap', 'wurlitzer', 'graph_tool', 'bayanpy', 'leidenalg'}
Note: to be able to use all crisp methods, you need to install some additional packages:  {'ASLPAw', 'pyclustering'}
Note: to be able to use all crisp methods, you need to install some additional packages:  {'infomap', 'wurlitzer', 'leidenalg'}

PART I : Exploratory Data Analysis (EDA)¶

/!\ DO NOT RUN THE FOLLOWING CELLS UNTIL THE OPPOSITE INSTRUCTION /!\¶

Bitcoin stock price Analysis¶

In [3]:
import yfinance as yf

def get_bitcoin_stock_prices(start_date, end_date):
    """
    Retrieves daily Bitcoin stock prices for the specified date range.
    
    Parameters:
        start_date (str): Start date in the format 'YYYY-MM-DD'.
        end_date (str): End date in the format 'YYYY-MM-DD'.
        
    Returns:
        DataFrame: DataFrame containing daily Bitcoin stock prices for the specified date range.
    """
    # Define the ticker symbol for Bitcoin (BTC-USD)
    ticker_symbol = 'BTC-USD'
    
    # Create ticker object
    ticker = yf.Ticker(ticker_symbol)
    
    # Get historical data for the specified date range
    historical_data = ticker.history(start=start_date, end=end_date)
    
    return historical_data

# Example usage
start_date = '2024-01-01'
end_date = '2024-01-10'
bitcoin_prices_df = get_bitcoin_stock_prices(start_date, end_date)
bitcoin_prices_df
Out[3]:
Open High Low Close Volume Dividends Stock Splits
Date
2024-01-01 00:00:00+00:00 42280.234375 44175.437500 42214.976562 44167.332031 18426978443 0.0 0.0
2024-01-02 00:00:00+00:00 44187.140625 45899.707031 44176.949219 44957.968750 39335274536 0.0 0.0
2024-01-03 00:00:00+00:00 44961.601562 45503.242188 40813.535156 42848.175781 46342323118 0.0 0.0
2024-01-04 00:00:00+00:00 42855.816406 44770.023438 42675.175781 44179.921875 30448091210 0.0 0.0
2024-01-05 00:00:00+00:00 44192.980469 44353.285156 42784.718750 44162.691406 32336029347 0.0 0.0
2024-01-06 00:00:00+00:00 44178.953125 44227.632812 43475.156250 43989.195312 16092503468 0.0 0.0
2024-01-07 00:00:00+00:00 43998.464844 44495.570312 43662.230469 43943.097656 19330573863 0.0 0.0
2024-01-08 00:00:00+00:00 43948.707031 47218.000000 43244.082031 46970.503906 42746192015 0.0 0.0
2024-01-09 00:00:00+00:00 46987.640625 47893.699219 45244.714844 46139.730469 39821290992 0.0 0.0

Import Data¶

Import block¶

Column Description
id Unique identifier for each block record.
transaction_hash Hash of the block's transaction.
time Timestamp indicating when the block was mined, in Unix epoch format.
median_time Median timestamp of the block.
size Size of the block in bytes.
stripped_size Size of the block after removing witness data (for Segregated Witness transactions).
weight The weight of the block (considering transaction size and witness data), used in Segregated Witness.
version The version of the block.
version_hex Hexadecimal representation of the block version.
version_bits Version bits for signaling soft forks.
merkle_root Root of the Merkle tree of all transactions in the block.
nonce A random value that miners increment to find a valid proof-of-work.
bits Target threshold for the proof-of-work hash.
difficulty Difficulty of finding a valid block hash.
chainwork Cumulative work done to build the blockchain.
coinbase_data_hex Hexadecimal representation of the coinbase transaction data.
transaction_count Total number of transactions in the block.
witness_count Number of witness transactions (for Segregated Witness).
input_count Total number of inputs in all transactions.
output_count Total number of outputs in all transactions.
input_total Total value of all inputs in the block (in bitcoin).
input_total_usd Total value of all inputs in the block (in USD).
output_total Total value of all outputs in the block (in bitcoin).
output_total_usd Total value of all outputs in the block (in USD).
fee_total Total transaction fees collected by miners for all transactions in the block (in bitcoin).
fee_total_usd Total transaction fees collected by miners for all transactions in the block (in USD).
fee_per_kb Average transaction fee per kilobyte of the block.
fee_per_kb_usd Average transaction fee per kilobyte of the block (in USD).
fee_per_kwu Average transaction fee per kiloweight unit (Segregated Witness weight unit) of the block.
fee_per_kwu_usd Average transaction fee per kiloweight unit of the block (in USD).
cdd_total Cumulative Days Destroyed (a measure of how much existing coin supply is transacted).
generation Reward generated by the coinbase transaction (in bitcoin).
generation_usd Reward generated by the coinbase transaction (in USD).
reward Total reward for miners (including transaction fees) for mining the block (in bitcoin).
reward_usd Total reward for miners (including transaction fees) for mining the block (in USD).
guessed_miner Estimation or guess of the mining pool or entity that mined the block based on analysis or heuristics.
In [4]:
tsv_block = "dump_bitcoin_10_01_2024/blockchair_bitcoin_blocks_20240110.tsv"

# Read the TSV file into a pandas DataFrame
df_blocks = pd.read_csv(tsv_block, sep='\t')
df_blocks = df_blocks.rename(columns={'hash': 'transaction_hash'})

# Display the DataFrame
print("#### Block dimension ####")
print(df_blocks.shape)

print("\n#### Block attributes ####")
print(df_blocks.columns.tolist())

df_blocks
#### Block dimension ####
(133, 36)

#### Block attributes ####
['id', 'transaction_hash', 'time', 'median_time', 'size', 'stripped_size', 'weight', 'version', 'version_hex', 'version_bits', 'merkle_root', 'nonce', 'bits', 'difficulty', 'chainwork', 'coinbase_data_hex', 'transaction_count', 'witness_count', 'input_count', 'output_count', 'input_total', 'input_total_usd', 'output_total', 'output_total_usd', 'fee_total', 'fee_total_usd', 'fee_per_kb', 'fee_per_kb_usd', 'fee_per_kwu', 'fee_per_kwu_usd', 'cdd_total', 'generation', 'generation_usd', 'reward', 'reward_usd', 'guessed_miner']
Out[4]:
id transaction_hash time median_time size stripped_size weight version version_hex version_bits merkle_root nonce bits difficulty chainwork coinbase_data_hex transaction_count witness_count input_count output_count input_total input_total_usd output_total output_total_usd fee_total fee_total_usd fee_per_kb fee_per_kb_usd fee_per_kwu fee_per_kwu_usd cdd_total generation generation_usd reward reward_usd guessed_miner
0 825071 000000000000000000031cb6b4df6a1238403d07055b05... 2024-01-10 00:35:31 2024-01-09 23:27:46 1598270 798354 3993332 583933952 22ce2000 100010110011100010000000000000 3e1746a2512b8946846757735e834d5819f538eb8366e9... 1209507708 386127977 73197634206448 000000000000000000000000000000000000000064d924... 03ef960c1462696e616e63652f313030382e00a0020f93... 2962 2751 7090 12208 4384137053122 2015519360 4384762053122 2.015807e+09 112190208 51577.2031 70211.560 32.2786 28104.379 12.9203 165530.082047 625000000 287331.25 737190208 338908.4688 Binance
1 825072 00000000000000000002cc13fbb98f2edf738fd007b138... 2024-01-10 01:03:55 2024-01-09 23:27:57 1649184 781269 3992991 615956480 24b6c000 100100101101101100000000000000 21640b5e51e67c566033f802db0572b3fafa7f217afd96... 2083341607 386127977 73197634206448 000000000000000000000000000000000000000064d966... 03f0960c0497ec9d652f466f756e647279205553412050... 2432 2260 8452 8875 5102700579174 2345864448 5103325579174 2.346152e+09 95000389 43674.5273 57614.855 26.4873 23798.223 10.9407 275368.332663 625000000 287331.25 720000389 331005.7812 Foundry USA Pool
2 825073 000000000000000000016a8554507d61e6dbc70a4216de... 2024-01-10 01:17:27 2024-01-09 23:28:21 1589258 801401 3993461 660176896 27598000 100111010110011000000000000000 7ce942ef3713356330b5595b8197fe6e506e7cd4ebc454... 70100840 386127977 73197634206448 000000000000000000000000000000000000000064d9a9... 03f1960c172f5669614254432f4d696e65642062792061... 3332 3089 7553 10199 1164968002118 535570752 1165593002118 5.358581e+08 75508770 34713.6484 47523.836 21.8482 18915.113 8.6958 248606.733430 625000000 287331.25 700508770 322044.9062 ViaBTC
3 825074 00000000000000000000104d23d4b3d2c220ea2555c4a0... 2024-01-10 01:17:57 2024-01-09 23:39:32 1782086 737022 3993152 551550976 20e00000 100000111000000000000000000000 3298322ba382ce0b3c76dea2c4fbbaff90ff9af01f71ce... 4050443161 386127977 73197634206448 000000000000000000000000000000000000000064d9eb... 03f2960c082f5669614254432f2cfabe6d6d8ec77aa8ad... 2147 1993 6189 8087 699464942935 321565024 700089942935 3.218524e+08 45535828 20934.1855 25557.459 11.7493 11407.536 5.2446 244141.365342 625000000 287331.25 670535828 308265.4375 ViaBTC
4 825075 0000000000000000000128f2f60d43b78b9459b423ae17... 2024-01-10 01:26:31 2024-01-09 23:45:38 1932353 687086 3993611 604389376 24064000 100100000001100100000000000000 12289ed1d94938edd2bb59ac9bc0301c8495880c2b7c8b... 2337013785 386127977 73197634206448 000000000000000000000000000000000000000064da2e... 03f3960c194d696e656420627920416e74506f6f6c2097... 2072 1971 8827 6012 577419223479 265456944 578044223479 2.657443e+08 65839947 30268.5996 34080.800 15.6681 16493.720 7.5828 246041.961771 625000000 287331.25 690839947 317599.8438 AntPool
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
128 825199 0000000000000000000056407ba45b95b4697e0b78bc96... 2024-01-10 22:58:25 2024-01-10 22:22:53 2274015 574598 3997809 647225344 2693e000 100110100100111110000000000000 c30a02251c49123f820cfaccadbdfb26338ddd41dd03a5... 3073202514 386127977 73197634206448 000000000000000000000000000000000000000064fa6d... 036f970c2cfabe6d6d4965851c60167dca5ba4367b311e... 2462 2362 4846 7027 320955941876 147553072 321580941876 1.478404e+08 50167919 23063.6973 22066.178 10.1444 12554.720 5.7719 11822.013023 625000000 287331.25 675167919 310394.9375 F2Pool
129 825200 0000000000000000000195e521b2ee04d52bdac56069de... 2024-01-10 23:03:46 2024-01-10 22:31:11 1703348 764891 3998021 695418880 29734000 101001011100110100000000000000 40a7d67da1e57a8f35f8b0738b58957e7596d98cd9f16f... 1728276037 386127977 73197634206448 000000000000000000000000000000000000000064fab0... 0370970c2cfabe6d6d0aa5ea12b0c8c3511793a8d806db... 3351 3208 6930 8188 460996648195 211933984 461621648195 2.122213e+08 59336634 27278.8301 34846.363 16.0198 14849.138 6.8265 21846.099858 625000000 287331.25 684336634 314610.0938 F2Pool
130 825201 00000000000000000003708536678656de241850f2c8d1... 2024-01-10 23:26:20 2024-01-10 22:34:45 1619503 791389 3993670 805830656 30080000 110000000010000000000000000000 3331907f6e0c95df7a1878713fa7c5236bae800511b9bc... 1406715446 386127977 73197634206448 000000000000000000000000000000000000000064faf2... 0371970c194d696e656420627920416e74506f6f6c20b4... 3595 3411 6507 12417 3922763169330 1803411968 3923388169330 1.803699e+09 97612928 44875.5898 60291.070 27.7176 24452.883 11.2418 218458.561485 625000000 287331.25 722612928 332206.8438 AntPool
131 825202 00000000000000000003ceb0e97b1b7de0c449546f7364... 2024-01-10 23:34:19 2024-01-10 22:35:09 1671181 775555 3997846 536969216 20018000 100000000000011000000000000000 ba5415ff7db9eafd6bbdf0adaf889a7f9a1e3794311965... 3156739299 386127977 73197634206448 000000000000000000000000000000000000000064fb35... 0372970c2cfabe6d6dcdf07697f67a6a92071ea4d08954... 3820 3649 6601 11084 1233991147461 567302720 1234616147461 5.675901e+08 61210942 28140.5059 36638.188 16.8436 15318.138 7.0421 30406.603814 625000000 287331.25 686210942 315471.7500 F2Pool
132 825203 0000000000000000000357f2c0411a907b380bb3caae77... 2024-01-10 23:51:21 2024-01-10 22:55:42 1695867 765649 3992814 562544640 2187c000 100001100001111100000000000000 9327da31731e160a0b624d6881cdcca1bc2bf22fb31896... 3473839639 386127977 73197634206448 000000000000000000000000000000000000000064fb77... 0373970c045a2d9f652f466f756e647279205553412050... 3801 3642 6388 11549 3199955342134 1471115520 3200580342134 1.471403e+09 74880785 34424.9414 44162.600 20.3031 18758.960 8.6241 145833.700324 625000000 287331.25 699880785 321756.1875 Foundry USA Pool

133 rows × 36 columns

Import transactions¶

Column Description
block_id Identifier of the block in which the transaction is included.
transaction_hash Unique hash identifier of the transaction.
time Timestamp of when the transaction was recorded in the blockchain.
size Size of the transaction in bytes.
weight Weight of the transaction, accounting for witness data in SegWit transactions.
version Version number of the transaction format.
lock_time The earliest time or block height when a transaction can be added to the blockchain.
is_coinbase Binary indicator (True/False) for coinbase transactions.
has_witness Binary indicator (True/False) if the transaction includes witness data (SegWit).
input_count Number of inputs (UTXOs) in the transaction.
output_count Number of outputs (destination addresses) in the transaction.
input_total Total value of the inputs in satoshis.
input_total_usd Total value of the inputs in USD.
output_total Total value of the outputs in satoshis.
output_total_usd Total value of the outputs in USD.
fee Transaction fee paid by the sender in satoshis.
fee_usd Transaction fee paid by the sender in USD.
fee_per_kb Transaction fee rate per kilobyte of transaction size.
fee_per_kb_usd Transaction fee rate per kilobyte in USD.
fee_per_kwu Transaction fee rate per kiloweight unit (kwu).
fee_per_kwu_usd Transaction fee rate per kiloweight unit in USD.
cdd_total Cumulative Daily Difficulty (CDD) total. A measure related to historical network difficulty.
In [5]:
tsv_transactions = "dump_bitcoin_10_01_2024/blockchair_bitcoin_transactions_20240110.tsv"

# Read the TSV file into a pandas DataFrame
df_transactions = pd.read_csv(tsv_transactions, sep='\t')
df_transactions = df_transactions.rename(columns={'hash': 'transaction_hash'})

# Display the DataFrame
print("#### Transaction dimension ####")
print(df_transactions.shape)

print("\n#### Transaction attributes ####")
print(df_transactions.columns.tolist())

df_transactions
#### Transaction dimension ####
(383774, 22)

#### Transaction attributes ####
['block_id', 'transaction_hash', 'time', 'size', 'weight', 'version', 'lock_time', 'is_coinbase', 'has_witness', 'input_count', 'output_count', 'input_total', 'input_total_usd', 'output_total', 'output_total_usd', 'fee', 'fee_usd', 'fee_per_kb', 'fee_per_kb_usd', 'fee_per_kwu', 'fee_per_kwu_usd', 'cdd_total']
Out[5]:
block_id transaction_hash time size weight version lock_time is_coinbase has_witness input_count output_count input_total input_total_usd output_total output_total_usd fee fee_usd fee_per_kb fee_per_kb_usd fee_per_kwu fee_per_kwu_usd cdd_total
0 825071 ef823e020330de717cc72f834cc3dc7f32baf4ea7e3e5b... 2024-01-10 00:35:31 299 1088 1 0 1 1 1 3 0 0.000000e+00 737190208 3.389085e+05 0 0.0000 0.000 0.0000 0.00 0.0000 0.000000e+00
1 825071 3f325a9160b6d49b1a60bdeb56689838a7fd9c38f2af56... 2024-01-10 00:35:31 225 573 1 0 0 1 1 2 465974173 2.142223e+05 465864157 2.141717e+05 110016 50.5777 488960.000 224.7896 192000.00 88.2682 9.743930e+00
2 825071 5c56ae6de10afbcc1786c3093e0fb92d4260d1cb9090ec... 2024-01-10 00:35:31 225 900 1 0 0 0 1 2 8729714 4.013311e+03 8583260 3.945982e+03 146454 67.3293 650906.700 299.2415 162726.67 74.8105 1.429036e-01
3 825071 ef3cde8f1ce0c98388b973f855aaee003e9265dfa09b38... 2024-01-10 00:35:31 226 574 1 0 0 1 1 2 210000000 9.654330e+04 209928000 9.651020e+04 72000 33.1006 318584.060 146.4626 125435.54 57.6667 5.841597e-01
4 825071 ef91085cbe36fb48b8391ada7af60e18b6e752eb4efc3a... 2024-01-10 00:35:31 419 1025 1 0 0 1 2 2 17000493887 7.815637e+06 17000369903 7.815580e+06 123984 56.9992 295904.530 136.0364 120960.00 55.6089 1.437647e+01
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
383769 825203 6fdf800be6274f476ad89322818a5ba54ba9f9cd530792... 2024-01-10 23:51:21 379 625 1 0 0 1 1 1 6969 3.203900e+00 294 1.352000e-01 6675 3.0687 17612.137 8.0968 10680.00 4.9099 8.243424e-07
383770 825203 cf81f0ca6c1ec5639ff010dfe44a4fbe41708e0eb6ab80... 2024-01-10 23:51:21 379 625 1 0 0 1 1 1 6969 3.203900e+00 294 1.352000e-01 6675 3.0687 17612.137 8.0968 10680.00 4.9099 8.243424e-07
383771 825203 fb8907c6983b9569c2eb8532e58b2db83752799390e69e... 2024-01-10 23:51:21 379 625 1 0 0 1 1 1 6969 3.203900e+00 294 1.352000e-01 6675 3.0687 17612.137 8.0968 10680.00 4.9099 8.243424e-07
383772 825203 37e8e17e238972b852f143c6c3473128d821da14159f3b... 2024-01-10 23:51:21 379 625 1 0 0 1 1 1 6969 3.203900e+00 294 1.352000e-01 6675 3.0687 17612.137 8.0968 10680.00 4.9099 8.243424e-07
383773 825203 b7311d24739cdf0a680f5c70ee09ea15e2134e3fc1a7ff... 2024-01-10 23:51:21 379 625 1 0 0 1 1 1 6969 3.203900e+00 294 1.352000e-01 6675 3.0687 17612.137 8.0968 10680.00 4.9099 8.243424e-07

383774 rows × 22 columns

Import adresses¶

Column Description
address Bitcoin address, a unique identifier on the blockchain.
balance The balance of Bitcoin associated with the address, measured in satoshis.
In [6]:
tsv_adresses = "dump_bitcoin_10_01_2024/blockchair_bitcoin_addresses_latest.tsv"

# Read the TSV file into a pandas DataFrame
df_adresses = pd.read_csv(tsv_adresses, sep='\t')
df_adresses = df_adresses.rename(columns={'address': 'hash_address'})

# Display the DataFrame
print("#### Adresses dimension ####")
print(df_adresses.shape)

print("\n#### Adresses attributes ####")
print(df_adresses.columns.tolist())

df_adresses
#### Adresses dimension ####
(52598833, 2)

#### Adresses attributes ####
['hash_address', 'balance']
Out[6]:
hash_address balance
0 34xp4vRoCGJym3xR7yCVPFHoCNxv4Twseo 24859746692320
1 bc1qgdjqv0av3q56jvd82tkdjpy7gdp9ut8tlqmgrpmv24... 20401007547456
2 bc1ql49ydapnjafl5t2cp9zqpjwe6pdgmxy98859v2 12713620120811
3 39884E3j6KZj82FK4vcCrkUvWYL5MQaS3v 11517738865340
4 bc1qazcm763858nkj2dj986etajv6wquslv8uxwczt 9464330762512
... ... ...
52598828 1AyjGkoaiNir5PRMTvyTHSyCLguyEgYpLv 1
52598829 16WZ9Fi8EDqyPCeLmEekWuPqxhk7t5K4fb 1
52598830 1FaUaprkrAZnWZPCj5JZaK9R18W9XBzD81 1
52598831 1JNuAjoN9cj8zyHfJ45nfhMMEgkoVwLvkT 1
52598832 13xUr3oovz6rd2w9NrpHEyZfr1BfNs3pum 1

52598833 rows × 2 columns

Import adresses with entity¶

In [7]:
csv_entity = "dump_bitcoin_10_01_2024/addresses_2016.csv"

# Read the TSV file into a pandas DataFrame
df_entity = pd.read_csv(csv_entity) 
df_entity = df_entity.rename(columns={'address': 'hash_address'})
In [8]:
df_entity
Out[8]:
address_num address_type hash_address cluster data addresses clusters entity_name entity_type
0 20803569 pubkeyhash 1DrK44np3gMKuvcGeFVv9Jk67zodP52eMu 107718037.0 NaN NaN NaN BitFury mining
1 47665646 pubkeyhash 1Ay1fs7ghRvURVcYMxEk1in9PMtrrMKoSR 117741795.0 NaN NaN NaN NaN NaN
2 168251500 pubkeyhash 1CuCRqGMc1Vbax4nk5JTALepVAGR6eh2AW 23159943.0 NaN NaN NaN NaN NaN
3 173405976 pubkeyhash 1HiWFALZS9WR6ZhptzDidpGEvDUdbqzGAi 182789480.0 NaN NaN NaN NaN NaN
4 173432615 pubkeyhash 1LHoMBBnYboJgsfyn1Jk9PfjH274L96qgL 182811195.0 NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ...
720647 173936534 pubkeyhash 1G7oWEsDbEwdr5WE1k9w1wkBuvKpu9quZ4 183177928.0 NaN NaN NaN NaN NaN
720648 173936535 pubkeyhash 1MoNs5uH4CF43XHH17NUCYYVw1Nv68BvFX 155137012.0 NaN NaN NaN NaN NaN
720649 85082659 pubkeyhash 1EUgmNoALra2YSCGcqdpLCNnGYAFQeLEp9 132278807.0 NaN NaN NaN NaN NaN
720650 8373447 multisig1of3 NaN NaN NaN ['1PexU18WvWPTcEFWpFkzRdm1qXWShoeBan', '12xeyA... [183177929, 183177930, 132278807] NaN NaN
720651 8373448 multisig1of3 NaN NaN NaN ['1HgBEFp4SyFcBWtJChFrozPDLBhdhh7pS6', '1FL3PH... [183177931, 183177932, 132278807] NaN NaN

720652 rows × 9 columns

Import inputs¶

Column Description
block_id Identifier of the block in which the transaction is included.
transaction_hash Unique hash identifier of the transaction.
index Index of the input in the transaction.
time Timestamp of when the input transaction was recorded in the blockchain.
value Monetary value of the input in satoshis.
value_usd Monetary value of the input in USD.
recipient Address number of the input recipient.
type Type of the address (e.g., pubkey, pubkeyhash, scripthash, multisig).
script_hex Hexadecimal representation of the input script.
is_from_coinbase Binary indicator (True/False) if the input is from a coinbase transaction.
is_spendable Binary indicator (True/False) if the input is spendable.
spending_block_id Identifier of the block where the input is spent.
spending_transaction_hash Unique hash identifier of the spending transaction.
spending_index Index of the spent output in the spending transaction.
spending_time Timestamp of when the spending transaction was recorded.
spending_value_usd Monetary value of the spending input in USD.
spending_sequence Sequence number of the spending transaction.
spending_signature_hex Hexadecimal representation of the spending input signature.
spending_witness Binary indicator (True/False) if the spending transaction uses a witness.
lifespan Time difference (in seconds) between the spending and input transactions.
cdd Cumulative Daily Difficulty (CDD) of the input transaction.
In [9]:
tsv_inputs = "dump_bitcoin_10_01_2024/blockchair_bitcoin_inputs_20240110.tsv"

# Read the TSV file into a pandas DataFrame
df_inputs = pd.read_csv(tsv_inputs, sep='\t')
df_inputs = df_inputs.rename(columns={'recipient': 'hash_address'})

# Display the DataFrame
print("#### Intputs dimension ####")
print(df_inputs.shape)

print("\n#### Intputs attributes ####")
print(df_inputs.columns.tolist())

df_inputs = df_inputs.drop_duplicates(subset='transaction_hash', keep='first')

df_inputs
#### Intputs dimension ####
(899940, 21)

#### Intputs attributes ####
['block_id', 'transaction_hash', 'index', 'time', 'value', 'value_usd', 'hash_address', 'type', 'script_hex', 'is_from_coinbase', 'is_spendable', 'spending_block_id', 'spending_transaction_hash', 'spending_index', 'spending_time', 'spending_value_usd', 'spending_sequence', 'spending_signature_hex', 'spending_witness', 'lifespan', 'cdd']
Out[9]:
block_id transaction_hash index time value value_usd hash_address type script_hex is_from_coinbase is_spendable spending_block_id spending_transaction_hash spending_index spending_time spending_value_usd spending_sequence spending_signature_hex spending_witness lifespan cdd
0 824788 8d27bc23cd585a7181e967a626e34a2292a09b9cd9c387... 0 2024-01-07 22:24:21 465974173 2.050612e+05 bc1qykpk3q0m9ku2d9fc6q8hxn3ksk4z8a7g8da089 witness_v0_scripthash 001425836881fb2db8a69538d00f734e3685aa23f7c8 0 -1 825071 3f325a9160b6d49b1a60bdeb56689838a7fd9c38f2af56... 0 2024-01-10 00:35:31 2.142223e+05 4294967293 NaN 3044022015fcf7ec7036938864e49e5851b7335870f0ef... 180670 9.743930
1 824840 680643bf66468a9766f8ab0fabeb460961131387f6cfd6... 1 2024-01-08 09:18:16 8729714 3.832781e+03 1JwPSPmZqiAKKenSq3zuVexvrzEjwWzHg8 pubkeyhash 76a914c4c35494c994805b405ce3df19d7d2b1565c3f15... 0 -1 825071 5c56ae6de10afbcc1786c3093e0fb92d4260d1cb9090ec... 0 2024-01-10 00:35:31 4.013311e+03 268435456 47304402201e7a2ae8e888bbefeb711979a7f6b51e9bf0... NaN 141435 0.142904
2 825030 4514db523968fdcec96d62f24e5a079ff9c4df70d56c32... 0 2024-01-09 17:54:57 210000000 9.892890e+04 bc1qgvrek7mjkddrcq3w2t3xc06ekqdaqs23fw4gav witness_v0_scripthash 001443079b7b72b35a3c022e52e26c3f59b01bd04151 0 -1 825071 ef3cde8f1ce0c98388b973f855aaee003e9265dfa09b38... 0 2024-01-10 00:35:31 9.654330e+04 4294967293 NaN 3045022100860693bdce15907426b7e309ae08f765be8a... 24034 0.584160
3 825061 d037da05e086ee5375e273cbb4e5cb26db68b289934353... 0 2024-01-09 22:34:03 17000000000 8.008530e+06 3LhVNi7CGDKLmEfnNGaJG2KnF1n3Pvf5r4 scripthash a914d0814e10ddbe18fd27941598851b62ec073b4edd87 0 -1 825071 ef91085cbe36fb48b8391ada7af60e18b6e752eb4efc3a... 0 2024-01-10 00:35:31 7.815410e+06 4294967295 160014ad74d075c9e0e2daef1bc1ac0a1620618917d6aa 30440220072ec26fbc0c3e01f4d862b16dfbc442976e2d... 7288 14.339815
4 824019 b7d6de5f867d4a00d09a9cf82902d4b0f0da2d750752c1... 1 2024-01-02 14:28:12 493887 2.205008e+02 34ZbJuSvQaCB8Ms4AFMEgvegg58yEpXdnW scripthash a9141f80eb22f83b9515637c37ca4c0e91a9a6e17b1287 0 -1 825071 ef91085cbe36fb48b8391ada7af60e18b6e752eb4efc3a... 1 2024-01-10 00:35:31 2.270547e+02 4294967295 160014308cc437d0807917e6083ce12ed5e0f8f7d7661f 3045022100faf6c021a03ec28271e1f662cafcf7935667... 641239 0.036655
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
899771 825202 9610e6cedece37f246afcbfb9d9d4ce51717a0d3f5b093... 2 2024-01-10 23:34:19 16972653 7.802838e+03 bc1pfp764ez6ssxcnyfg62qy50eu32u9h6d920ls457aj0... witness_v1_taproot 5120487daae45a840d899128d2804a3f3c8ab85be9a553... 0 -1 825203 d1236a945709814363b79497fbe0ce6eb3dd26a12f81ae... 0 2024-01-10 23:51:21 7.802838e+03 4294967295 NaN dc1721a4ce4e7dd5ed0384f1659c6d68f912a7b193188c... 1022 0.002008
899772 825203 d1236a945709814363b79497fbe0ce6eb3dd26a12f81ae... 1 2024-01-10 23:51:21 16907478 7.772875e+03 bc1pfp764ez6ssxcnyfg62qy50eu32u9h6d920ls457aj0... witness_v1_taproot 5120487daae45a840d899128d2804a3f3c8ab85be9a553... 0 -1 825203 0c1ee15673e7cbb83ef9046fe03bbe6fbcae4b6b338d41... 0 2024-01-10 23:51:21 7.772875e+03 4294967295 NaN 31caa0c6583b0b8cdb7c3be1a1410304e8a927a9e64e4d... 0 0.000000
899774 825203 c24abf0fca46bd5a6ab65868de80bf3fc833af6bd57838... 1 2024-01-10 23:51:21 44508 2.046170e+01 bc1qg6ay59mm4lh6g27qfcpqshtqrpu8mffq5fp6tc witness_v0_scripthash 001446ba4a177bafefa42bc04e02085d6018787da520 0 -1 825203 077fa44271cc81698acef7321c3011cae34c030f31045e... 0 2024-01-10 23:51:21 2.046170e+01 4294967295 NaN 3045022100e0b17569f046512218b7354b996193af67a1... 0 0.000000
899775 825200 68da72d8daaef9541adfd4a212cd72ecaee6a541c90f39... 0 2024-01-10 23:03:46 316553914 1.455293e+05 bc1q7084tx0ztvxq3mfnx8nwpa4hpwuxf8zxrj3tcvyn6z... witness_v0_keyhash 0020f3cf5599e25b0c08ed3331e6e0f6b70bb8649c461c... 0 -1 825203 8ea76e6b6ac29c91c9947f737a92d84c34e6e80dd3f091... 0 2024-01-10 23:51:21 1.455293e+05 4294967295 NaN 3044022079009aec6f11fdedeb1748577c0690ea6977d6... 2855 0.104602
899778 825203 b9f0148dd7f89ce61d3012b1e3debbd743daed52af19c5... 0 2024-01-10 23:51:21 546 2.510000e-01 bc1q6paugjafz08l60guyu8r6ut743va7aju0qj6nh witness_v0_scripthash 0014d07bc44ba913cffd3d1c270e3d717eac59df765c 0 -1 825203 f8fea9b38b16f76e6ac5a5ecb50a6cd4911689d4a4b0b4... 0 2024-01-10 23:51:21 2.510000e-01 4294967295 NaN 3045022100da09241a0846475b3fea18c28881609f66ec... 0 0.000000

537886 rows × 21 columns

Import outputs¶

Column Description
block_id Identifier of the block in which the transaction is included.
transaction_hash Unique hash identifier of the transaction.
index Index of the output in the transaction.
time Timestamp of when the output transaction was recorded in the blockchain.
value Monetary value of the output in satoshis.
value_usd Monetary value of the output in USD.
recipient Address number of the output recipient.
type Type of the address (e.g., pubkey, pubkeyhash, scripthash, multisig).
script_hex Hexadecimal representation of the output script.
is_from_coinbase Binary indicator (True/False) if the output is from a coinbase transaction.
is_spendable Binary indicator (True/False) if the output is spendable.
In [10]:
tsv_outputs = "dump_bitcoin_10_01_2024/blockchair_bitcoin_outputs_20240110.tsv"

# Read the TSV file into a pandas DataFrame
df_outputs = pd.read_csv(tsv_outputs, sep='\t')
df_outputs = df_outputs.rename(columns={'hash': 'transaction_hash', 'recipient': 'hash_address'})

# Display the DataFrame
print("#### Outputs dimension ####")
print(df_outputs.shape)

print("\n#### Outputs attributes ####")
print(df_outputs.columns.tolist())

df_outputs = df_outputs.drop_duplicates(subset='transaction_hash', keep='first')

df_outputs
#### Outputs dimension ####
(1295729, 11)

#### Outputs attributes ####
['block_id', 'transaction_hash', 'index', 'time', 'value', 'value_usd', 'hash_address', 'type', 'script_hex', 'is_from_coinbase', 'is_spendable']
Out[10]:
block_id transaction_hash index time value value_usd hash_address type script_hex is_from_coinbase is_spendable
0 825071 ef823e020330de717cc72f834cc3dc7f32baf4ea7e3e5b... 0 2024-01-10 00:35:31 737190208 3.389085e+05 3L8Ck6bm3sve1vJGKo6Ht2k167YKSKi8TZ scripthash a914ca35b1f4d02907314852f09935b9604507f8d70087 1 -1
5 825071 3f325a9160b6d49b1a60bdeb56689838a7fd9c38f2af56... 0 2024-01-10 00:35:31 200000000 9.194600e+04 1PbpDWfJLyLC6NAmNHgCaMGV3EFfCm6imi pubkeyhash 76a914f7e852d9db839c950866cc1aba405eaeef61e939... 0 -1
9 825071 5c56ae6de10afbcc1786c3093e0fb92d4260d1cb9090ec... 0 2024-01-10 00:35:31 5927900 2.725233e+03 1KsYPRDveiGwHPn8mCrPKyaeRNVmCJ3Bdk pubkeyhash 76a914cf0143288aca690d76de6107fd3d9966c282af0e... 0 -1
11 825071 ef3cde8f1ce0c98388b973f855aaee003e9265dfa09b38... 0 2024-01-10 00:35:31 40000000 1.838920e+04 15zrhjZ5N7ThsohJjM3Jp5xtaM1QYNUT6M pubkeyhash 76a91436d1b3cd51bd1a4cd4f04479698bde41b13952f1... 0 -1
15 825071 ef91085cbe36fb48b8391ada7af60e18b6e752eb4efc3a... 0 2024-01-10 00:35:31 17000000000 7.815410e+06 3K5oE7ew2BPG5tsAB6wt6fFfzQRoGg1yHy scripthash a914bec91b4309e151ef594a9bfd490c3bcd736a6e0487 0 -1
... ... ... ... ... ... ... ... ... ... ... ...
1295724 825203 6fdf800be6274f476ad89322818a5ba54ba9f9cd530792... 0 2024-01-10 23:51:21 294 1.352000e-01 bc1qntwyll7kkrsn4xtqvyvnwpx5hdkxzfla5muhh7 witness_v0_scripthash 00149adc4fffd6b0e13a996061193704d4bb6c6127fd 0 -1
1295725 825203 cf81f0ca6c1ec5639ff010dfe44a4fbe41708e0eb6ab80... 0 2024-01-10 23:51:21 294 1.352000e-01 bc1qntwyll7kkrsn4xtqvyvnwpx5hdkxzfla5muhh7 witness_v0_scripthash 00149adc4fffd6b0e13a996061193704d4bb6c6127fd 0 -1
1295726 825203 fb8907c6983b9569c2eb8532e58b2db83752799390e69e... 0 2024-01-10 23:51:21 294 1.352000e-01 bc1qntwyll7kkrsn4xtqvyvnwpx5hdkxzfla5muhh7 witness_v0_scripthash 00149adc4fffd6b0e13a996061193704d4bb6c6127fd 0 -1
1295727 825203 37e8e17e238972b852f143c6c3473128d821da14159f3b... 0 2024-01-10 23:51:21 294 1.352000e-01 bc1qntwyll7kkrsn4xtqvyvnwpx5hdkxzfla5muhh7 witness_v0_scripthash 00149adc4fffd6b0e13a996061193704d4bb6c6127fd 0 -1
1295728 825203 b7311d24739cdf0a680f5c70ee09ea15e2134e3fc1a7ff... 0 2024-01-10 23:51:21 294 1.352000e-01 bc1qntwyll7kkrsn4xtqvyvnwpx5hdkxzfla5muhh7 witness_v0_scripthash 00149adc4fffd6b0e13a996061193704d4bb6c6127fd 0 -1

383774 rows × 11 columns

Filter out columns¶

In [11]:
df_transactions = df_transactions[["transaction_hash", "time", "size", "weight", "input_count", "output_count",
                                  "input_total", "input_total_usd", "output_total", "output_total_usd", "fee",
                                  "fee_usd", "is_coinbase"]]

df_entity = df_entity[["address_num", "address_type", "hash_address", "entity_name", "entity_type"]]

df_inputs = df_inputs[["transaction_hash", "index", "hash_address",  "value", "value_usd", "time", "spending_transaction_hash", "spending_time", "spending_value_usd"]]
df_inputs = df_inputs.rename(columns={
    'index': 'index_input',
    'hash_address': 'hash_address_input',
    'value': 'value_input',
    'value_usd': 'value_usd_input',
    'time': 'time_input'
})


df_outputs = df_outputs[["transaction_hash", "index", "hash_address", "value", "value_usd", "time"]]
df_outputs = df_outputs.rename(columns={
    'index': 'index_output',
    'hash_address': 'hash_address_output',
    'value': 'value_output',
    'value_usd': 'value_usd_output',
    'time': 'time_output'
})

Merge inputs and outputs based on the hash of the transaction¶

In [12]:
edges = pd.merge(df_inputs, df_outputs, on='transaction_hash', how='inner')
In [13]:
print("=== Columns ===")
print(edges.columns)

print("\n\n=== Dimensions ===")
print(edges.shape)

print("\n\n=== Sample ===")
edges.head()
=== Columns ===
Index(['transaction_hash', 'index_input', 'hash_address_input', 'value_input',
       'value_usd_input', 'time_input', 'spending_transaction_hash',
       'spending_time', 'spending_value_usd', 'index_output',
       'hash_address_output', 'value_output', 'value_usd_output',
       'time_output'],
      dtype='object')


=== Dimensions ===
(181958, 14)


=== Sample ===
Out[13]:
transaction_hash index_input hash_address_input value_input value_usd_input time_input spending_transaction_hash spending_time spending_value_usd index_output hash_address_output value_output value_usd_output time_output
0 19164e615c48161141ff5c750c08c6a21202446b16720a... 0 bc1pu9uysdxzq99fzat8myautah7qnahec64dm3smpc95l... 47076 21.6423 2024-01-10 00:35:31 b3b8d806902bbbb45371a16344c004c77a3d09f280f92d... 2024-01-10 00:35:31 21.6423 0 bc1pu9uysdxzq99fzat8myautah7qnahec64dm3smpc95l... 47076 21.6423 2024-01-10 00:35:31
1 93f96b0e895bce583c0762d714e5d6cfb0eb2df5f4c8e7... 0 14dJRoKyj2i83uRbTUeKqhFMwvFZcpiXyn 768666170 353378.9062 2024-01-10 00:35:31 7525607e5547dafaf647d8a0f4073ba873c0b3e614cf77... 2024-01-10 00:35:31 353378.9062 0 14dJRoKyj2i83uRbTUeKqhFMwvFZcpiXyn 768666170 353378.9062 2024-01-10 00:35:31
2 b3b8d806902bbbb45371a16344c004c77a3d09f280f92d... 0 bc1qwfn4fq5l86snpddxa7wy9jh6dc86vd6jdquw2g 546 0.2510 2024-01-10 00:35:31 2a3b7681cc8e2828f6d167f94784c16a01d063e32aae64... 2024-01-10 00:35:31 0.2510 0 bc1qwfn4fq5l86snpddxa7wy9jh6dc86vd6jdquw2g 546 0.2510 2024-01-10 00:35:31
3 6e2d986e8a97c21b51e5f5c8d03a6e35c5d575aff5e8cc... 0 bc1pk8fr0lkcfp7csz04hqn4alqy332h0nksmqmjjg9hye... 42786 19.6700 2024-01-10 00:35:31 f35ad09853a3707e9146fc30eb049142f8b66fd677bf97... 2024-01-10 00:35:31 19.6700 0 bc1pk8fr0lkcfp7csz04hqn4alqy332h0nksmqmjjg9hye... 42786 19.6700 2024-01-10 00:35:31
4 f35ad09853a3707e9146fc30eb049142f8b66fd677bf97... 0 1GrwDkr33gT6LuumniYjKEGjTLhsL5kmqC 546 0.2510 2024-01-10 00:35:31 1a7622dab7b996900b4ba60dd598aa29d75dcfd244924b... 2024-01-10 00:35:31 0.2510 0 1GrwDkr33gT6LuumniYjKEGjTLhsL5kmqC 546 0.2510 2024-01-10 00:35:31
In [14]:
edges = edges[edges['hash_address_input'] == edges['hash_address_output']]
edges = edges[['transaction_hash', 'spending_transaction_hash', 'time_input', 'time_output', 
                   'hash_address_input', 'hash_address_output', 
                   'value_input', 'value_output', 'value_usd_input', 'value_usd_output']]
edges
Out[14]:
transaction_hash spending_transaction_hash time_input time_output hash_address_input hash_address_output value_input value_output value_usd_input value_usd_output
0 19164e615c48161141ff5c750c08c6a21202446b16720a... b3b8d806902bbbb45371a16344c004c77a3d09f280f92d... 2024-01-10 00:35:31 2024-01-10 00:35:31 bc1pu9uysdxzq99fzat8myautah7qnahec64dm3smpc95l... bc1pu9uysdxzq99fzat8myautah7qnahec64dm3smpc95l... 47076 47076 21.6423 21.6423
1 93f96b0e895bce583c0762d714e5d6cfb0eb2df5f4c8e7... 7525607e5547dafaf647d8a0f4073ba873c0b3e614cf77... 2024-01-10 00:35:31 2024-01-10 00:35:31 14dJRoKyj2i83uRbTUeKqhFMwvFZcpiXyn 14dJRoKyj2i83uRbTUeKqhFMwvFZcpiXyn 768666170 768666170 353378.9062 353378.9062
2 b3b8d806902bbbb45371a16344c004c77a3d09f280f92d... 2a3b7681cc8e2828f6d167f94784c16a01d063e32aae64... 2024-01-10 00:35:31 2024-01-10 00:35:31 bc1qwfn4fq5l86snpddxa7wy9jh6dc86vd6jdquw2g bc1qwfn4fq5l86snpddxa7wy9jh6dc86vd6jdquw2g 546 546 0.2510 0.2510
3 6e2d986e8a97c21b51e5f5c8d03a6e35c5d575aff5e8cc... f35ad09853a3707e9146fc30eb049142f8b66fd677bf97... 2024-01-10 00:35:31 2024-01-10 00:35:31 bc1pk8fr0lkcfp7csz04hqn4alqy332h0nksmqmjjg9hye... bc1pk8fr0lkcfp7csz04hqn4alqy332h0nksmqmjjg9hye... 42786 42786 19.6700 19.6700
4 f35ad09853a3707e9146fc30eb049142f8b66fd677bf97... 1a7622dab7b996900b4ba60dd598aa29d75dcfd244924b... 2024-01-10 00:35:31 2024-01-10 00:35:31 1GrwDkr33gT6LuumniYjKEGjTLhsL5kmqC 1GrwDkr33gT6LuumniYjKEGjTLhsL5kmqC 546 546 0.2510 0.2510
... ... ... ... ... ... ... ... ... ... ...
181947 342750fe8b5fd681d07103f377528b1528415c467b6a3a... 26d078c6cc1cb51e207663c4f6fa0ec45c3f3a1a7228a8... 2024-01-10 23:51:21 2024-01-10 23:51:21 bc1pxx9qcafmhg0er6w92ffcstfwksymjy7tmvwwzqh54m... bc1pxx9qcafmhg0er6w92ffcstfwksymjy7tmvwwzqh54m... 35277 35277 16.2179 16.2179
181948 806a43c458e6c6fb407452fcce180091ad1daa80237760... 6f6fb148dad5038fdd8a03dd35b45fd13aa1ca5f93a378... 2024-01-10 23:51:21 2024-01-10 23:51:21 bc1plz9ytx2a24va8zu59shg8unv6n86anvqd9kvcp6vcf... bc1plz9ytx2a24va8zu59shg8unv6n86anvqd9kvcp6vcf... 11665 11665 5.3628 5.3628
181952 35b2fd56400a31ef30263e39083c5dc387addb20ff42c7... 83d2ff592f8f7064185cdd7168c63e6fb89bb3d9ab97bd... 2024-01-10 23:34:19 2024-01-10 23:34:19 bc1qzcgex3h3sqmzj2qjztk5k9lhd8m3ts4r27lu02 bc1qzcgex3h3sqmzj2qjztk5k9lhd8m3ts4r27lu02 286065 286065 131.5127 131.5127
181956 68da72d8daaef9541adfd4a212cd72ecaee6a541c90f39... 8ea76e6b6ac29c91c9947f737a92d84c34e6e80dd3f091... 2024-01-10 23:03:46 2024-01-10 23:03:46 bc1q7084tx0ztvxq3mfnx8nwpa4hpwuxf8zxrj3tcvyn6z... bc1q7084tx0ztvxq3mfnx8nwpa4hpwuxf8zxrj3tcvyn6z... 316553914 316553914 145529.3281 145529.3281
181957 b9f0148dd7f89ce61d3012b1e3debbd743daed52af19c5... f8fea9b38b16f76e6ac5a5ecb50a6cd4911689d4a4b0b4... 2024-01-10 23:51:21 2024-01-10 23:51:21 bc1q6paugjafz08l60guyu8r6ut743va7aju0qj6nh bc1q6paugjafz08l60guyu8r6ut743va7aju0qj6nh 546 546 0.2510 0.2510

105532 rows × 10 columns

PART II : MODELLING¶

/!\ RUN THE NOTEBOOK FROM HERE¶

In [15]:
import os
import pandas as pd
import numpy as np

### Ondine path ###
# path = os.chdir('/Users/ondineji/Library/Mobile Documents/com~apple~CloudDocs/Documents/ACADEMIQUE/M2 FTD/S2/Network Analysis in Python /projet/data/')

### Hugo path ###
path = os.chdir('C:/Users/humic/OneDrive/Documents/Ecole/SorbonneFTD/Cours/Data_Mining_Network/project/data/')
pd.set_option('display.max_columns', None)

import warnings
warnings.filterwarnings("ignore")

# import data viz libs
import matplotlib.pyplot as plt
from matplotlib import cm

# Import graph libs
import networkx as nx
from nxviz import CircosPlot, ArcPlot
from cdlib import algorithms

# import custom functions

from utils import *

Build the transaction network for the SEC announcement on ETF bicoin¶

Build edges¶

Edge = an input / an output (attributes: address/entity, amount)

In [16]:
edges_df = pd.read_csv("edges.csv")
edges_df = edges_df[['transaction_hash', 'spending_transaction_hash', 'time_input', 'hash_address_input', 'value_input', 'value_usd_input']]
edges_df
Out[16]:
transaction_hash spending_transaction_hash time_input hash_address_input value_input value_usd_input
0 19164e615c48161141ff5c750c08c6a21202446b16720a... b3b8d806902bbbb45371a16344c004c77a3d09f280f92d... 2024-01-10 00:35:31 bc1pu9uysdxzq99fzat8myautah7qnahec64dm3smpc95l... 47076 21.6423
1 93f96b0e895bce583c0762d714e5d6cfb0eb2df5f4c8e7... 7525607e5547dafaf647d8a0f4073ba873c0b3e614cf77... 2024-01-10 00:35:31 14dJRoKyj2i83uRbTUeKqhFMwvFZcpiXyn 768666170 353378.9062
2 b3b8d806902bbbb45371a16344c004c77a3d09f280f92d... 2a3b7681cc8e2828f6d167f94784c16a01d063e32aae64... 2024-01-10 00:35:31 bc1qwfn4fq5l86snpddxa7wy9jh6dc86vd6jdquw2g 546 0.2510
3 6e2d986e8a97c21b51e5f5c8d03a6e35c5d575aff5e8cc... f35ad09853a3707e9146fc30eb049142f8b66fd677bf97... 2024-01-10 00:35:31 bc1pk8fr0lkcfp7csz04hqn4alqy332h0nksmqmjjg9hye... 42786 19.6700
4 f35ad09853a3707e9146fc30eb049142f8b66fd677bf97... 1a7622dab7b996900b4ba60dd598aa29d75dcfd244924b... 2024-01-10 00:35:31 1GrwDkr33gT6LuumniYjKEGjTLhsL5kmqC 546 0.2510
... ... ... ... ... ... ...
105527 342750fe8b5fd681d07103f377528b1528415c467b6a3a... 26d078c6cc1cb51e207663c4f6fa0ec45c3f3a1a7228a8... 2024-01-10 23:51:21 bc1pxx9qcafmhg0er6w92ffcstfwksymjy7tmvwwzqh54m... 35277 16.2179
105528 806a43c458e6c6fb407452fcce180091ad1daa80237760... 6f6fb148dad5038fdd8a03dd35b45fd13aa1ca5f93a378... 2024-01-10 23:51:21 bc1plz9ytx2a24va8zu59shg8unv6n86anvqd9kvcp6vcf... 11665 5.3628
105529 35b2fd56400a31ef30263e39083c5dc387addb20ff42c7... 83d2ff592f8f7064185cdd7168c63e6fb89bb3d9ab97bd... 2024-01-10 23:34:19 bc1qzcgex3h3sqmzj2qjztk5k9lhd8m3ts4r27lu02 286065 131.5127
105530 68da72d8daaef9541adfd4a212cd72ecaee6a541c90f39... 8ea76e6b6ac29c91c9947f737a92d84c34e6e80dd3f091... 2024-01-10 23:03:46 bc1q7084tx0ztvxq3mfnx8nwpa4hpwuxf8zxrj3tcvyn6z... 316553914 145529.3281
105531 b9f0148dd7f89ce61d3012b1e3debbd743daed52af19c5... f8fea9b38b16f76e6ac5a5ecb50a6cd4911689d4a4b0b4... 2024-01-10 23:51:21 bc1q6paugjafz08l60guyu8r6ut743va7aju0qj6nh 546 0.2510

105532 rows × 6 columns

In [17]:
edges = edges_df.rename(columns={
    'transaction_hash': 'from_transaction',
    'spending_transaction_hash': 'to_transaction',
    'time_input': 'time',
    'hash_address_input': 'hash_address',
    'value_input': 'value',
    'value_usd_input': 'value_usd'
})
edges
Out[17]:
from_transaction to_transaction time hash_address value value_usd
0 19164e615c48161141ff5c750c08c6a21202446b16720a... b3b8d806902bbbb45371a16344c004c77a3d09f280f92d... 2024-01-10 00:35:31 bc1pu9uysdxzq99fzat8myautah7qnahec64dm3smpc95l... 47076 21.6423
1 93f96b0e895bce583c0762d714e5d6cfb0eb2df5f4c8e7... 7525607e5547dafaf647d8a0f4073ba873c0b3e614cf77... 2024-01-10 00:35:31 14dJRoKyj2i83uRbTUeKqhFMwvFZcpiXyn 768666170 353378.9062
2 b3b8d806902bbbb45371a16344c004c77a3d09f280f92d... 2a3b7681cc8e2828f6d167f94784c16a01d063e32aae64... 2024-01-10 00:35:31 bc1qwfn4fq5l86snpddxa7wy9jh6dc86vd6jdquw2g 546 0.2510
3 6e2d986e8a97c21b51e5f5c8d03a6e35c5d575aff5e8cc... f35ad09853a3707e9146fc30eb049142f8b66fd677bf97... 2024-01-10 00:35:31 bc1pk8fr0lkcfp7csz04hqn4alqy332h0nksmqmjjg9hye... 42786 19.6700
4 f35ad09853a3707e9146fc30eb049142f8b66fd677bf97... 1a7622dab7b996900b4ba60dd598aa29d75dcfd244924b... 2024-01-10 00:35:31 1GrwDkr33gT6LuumniYjKEGjTLhsL5kmqC 546 0.2510
... ... ... ... ... ... ...
105527 342750fe8b5fd681d07103f377528b1528415c467b6a3a... 26d078c6cc1cb51e207663c4f6fa0ec45c3f3a1a7228a8... 2024-01-10 23:51:21 bc1pxx9qcafmhg0er6w92ffcstfwksymjy7tmvwwzqh54m... 35277 16.2179
105528 806a43c458e6c6fb407452fcce180091ad1daa80237760... 6f6fb148dad5038fdd8a03dd35b45fd13aa1ca5f93a378... 2024-01-10 23:51:21 bc1plz9ytx2a24va8zu59shg8unv6n86anvqd9kvcp6vcf... 11665 5.3628
105529 35b2fd56400a31ef30263e39083c5dc387addb20ff42c7... 83d2ff592f8f7064185cdd7168c63e6fb89bb3d9ab97bd... 2024-01-10 23:34:19 bc1qzcgex3h3sqmzj2qjztk5k9lhd8m3ts4r27lu02 286065 131.5127
105530 68da72d8daaef9541adfd4a212cd72ecaee6a541c90f39... 8ea76e6b6ac29c91c9947f737a92d84c34e6e80dd3f091... 2024-01-10 23:03:46 bc1q7084tx0ztvxq3mfnx8nwpa4hpwuxf8zxrj3tcvyn6z... 316553914 145529.3281
105531 b9f0148dd7f89ce61d3012b1e3debbd743daed52af19c5... f8fea9b38b16f76e6ac5a5ecb50a6cd4911689d4a4b0b4... 2024-01-10 23:51:21 bc1q6paugjafz08l60guyu8r6ut743va7aju0qj6nh 546 0.2510

105532 rows × 6 columns

Let's try to match the hash_address with the entity name¶

For reminder we imported previously a dataset which contain the entity name associated to hash_adress from 2016.

Do not run this cell because we need to import before df_entity dataset¶

In [18]:
# merged_df = pd.merge(edges, df_entity[['hash_address', 'entity_name', 'entity_type', 'address_type']], on='hash_address', how='left')
# merged_df["entity_name"].value_counts()
Out[18]:
entity_name
ePay.info    1
Name: count, dtype: int64

Unfortunately, we were unable to match all the hash addresses with the hash addresses for which we know the name of the entity. Indeed, the only entity we have managed to match is ebay So, for lack of data, we abandon the idea of being able to match the addresses of our graph with the names of entities.

Let's add edges with following attributes:¶

  • hash_address
  • value
  • value_usd
In [19]:
# Add edge list from pandas dataframe
G = nx.from_pandas_edgelist(edges, 'from_transaction', 'to_transaction', ['hash_address', 'value', 'value_usd'], 
                        create_using=nx.MultiDiGraph)

Check whether the transaction bitcoin network is acyclic¶

In [20]:
nx.is_directed_acyclic_graph(G)
Out[20]:
True

The output is true means that our network is acyclic. That's a good point because a transaction bitcoin network is directed acyclic graph because we cannot, it is not possible to make a transaction to oneself (i.e identical source address and target adress for a given transaction)

Let's visualize an edge of the network¶

  • source hash_adress: 19164e615c48161141ff5c750c08c6a21202446b16720a4f27541fa32cd2c695
  • target hash_adress: b3b8d806902bbbb45371a16344c004c77a3d09f280f92de83f341720aa537b7f
In [21]:
G["19164e615c48161141ff5c750c08c6a21202446b16720a4f27541fa32cd2c695"]["b3b8d806902bbbb45371a16344c004c77a3d09f280f92de83f341720aa537b7f"]
Out[21]:
AtlasView({0: {'hash_address': 'bc1pu9uysdxzq99fzat8myautah7qnahec64dm3smpc95l09eypr5nss4du7v4', 'value': 47076, 'value_usd': 21.6423}})

Build nodes of the bitcoin transaction network¶

Node = a transaction with the following attributes:

  • 'transaction_hash': hash,
  • 'time': timestamp,
  • 'size': int
  • 'weight': int,
  • 'input_count': int,
  • 'output_count': int,
  • 'input_total': int,
  • 'input_total_usd': float,
  • 'output_total': int,
  • 'output_total_usd': float,
  • 'fee': int,
  • 'fee_usd': float,
  • 'is_coinbase': bool
In [22]:
nodes_df = pd.read_csv("nodes.csv")
nodes_df = nodes_df[['transaction_hash', 'time', 'size', 'weight', 'input_count',
       'output_count', 'input_total', 'input_total_usd', 'output_total',
       'output_total_usd', 'fee', 'fee_usd', 'is_coinbase']]
nodes_df
Out[22]:
transaction_hash time size weight input_count output_count input_total input_total_usd output_total output_total_usd fee fee_usd is_coinbase
0 ef823e020330de717cc72f834cc3dc7f32baf4ea7e3e5b... 2024-01-10 00:35:31 299 1088 1 3 0 0.000000e+00 737190208 3.389085e+05 0 0.0000 1
1 3f325a9160b6d49b1a60bdeb56689838a7fd9c38f2af56... 2024-01-10 00:35:31 225 573 1 2 465974173 2.142223e+05 465864157 2.141717e+05 110016 50.5777 0
2 5c56ae6de10afbcc1786c3093e0fb92d4260d1cb9090ec... 2024-01-10 00:35:31 225 900 1 2 8729714 4.013311e+03 8583260 3.945982e+03 146454 67.3293 0
3 ef3cde8f1ce0c98388b973f855aaee003e9265dfa09b38... 2024-01-10 00:35:31 226 574 1 2 210000000 9.654330e+04 209928000 9.651020e+04 72000 33.1006 0
4 ef91085cbe36fb48b8391ada7af60e18b6e752eb4efc3a... 2024-01-10 00:35:31 419 1025 2 2 17000493887 7.815637e+06 17000369903 7.815580e+06 123984 56.9992 0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
383769 6fdf800be6274f476ad89322818a5ba54ba9f9cd530792... 2024-01-10 23:51:21 379 625 1 1 6969 3.203900e+00 294 1.352000e-01 6675 3.0687 0
383770 cf81f0ca6c1ec5639ff010dfe44a4fbe41708e0eb6ab80... 2024-01-10 23:51:21 379 625 1 1 6969 3.203900e+00 294 1.352000e-01 6675 3.0687 0
383771 fb8907c6983b9569c2eb8532e58b2db83752799390e69e... 2024-01-10 23:51:21 379 625 1 1 6969 3.203900e+00 294 1.352000e-01 6675 3.0687 0
383772 37e8e17e238972b852f143c6c3473128d821da14159f3b... 2024-01-10 23:51:21 379 625 1 1 6969 3.203900e+00 294 1.352000e-01 6675 3.0687 0
383773 b7311d24739cdf0a680f5c70ee09ea15e2134e3fc1a7ff... 2024-01-10 23:51:21 379 625 1 1 6969 3.203900e+00 294 1.352000e-01 6675 3.0687 0

383774 rows × 13 columns

In [23]:
# Add node attributes
nodes = nodes_df.set_index('transaction_hash').to_dict(orient='index')

for transaction_hash, data in nodes.items():
    G.add_node(transaction_hash, time=data['time'], size=data['size'], weight=data['weight'],
               input_count=data['input_count'], output_count=data['output_count'],
               input_total=data['input_total'], input_total_usd=data['input_total_usd'],
               output_total=data['output_total'], output_total_usd=data['output_total_usd'],
               fee=data['fee'], fee_usd=data['fee_usd'], is_coinbase=data['is_coinbase'])

Let's visualize a node (and its attributes) of our bitcoin transaction network¶

In [24]:
G.nodes["19164e615c48161141ff5c750c08c6a21202446b16720a4f27541fa32cd2c695"]
Out[24]:
{'time': '2024-01-10 00:35:31',
 'size': 234,
 'weight': 609,
 'input_count': 1,
 'output_count': 2,
 'input_total': 3213993,
 'input_total_usd': 1477.569,
 'output_total': 3163503,
 'output_total_usd': 1454.3572,
 'fee': 50490,
 'fee_usd': 23.2118,
 'is_coinbase': 0}

Export to .csv the edgdes dataframe and transaction nodes dataframe¶

In [25]:
edges_df.to_csv("edges.csv")
nodes_df.to_csv("nodes.csv")

Descriptive statistics of the graph¶

In [26]:
compute_graph_statistics(G)
MultiDiGraph with 383774 nodes and 105532 edges
Number of nodes 383774
Number of edges 105532
Average degree 0.549969513307311
Density 7.165297106718179e-07

We can note that the number of nodes is more than 3 times greater than the number of edges in the graph. This explains why the density of the graph is very low. This means that a transaction block is generally linked to only one other transaction block. Furthermore, given that the number of degrees (i.e. the number of neighbors) is very small (0.5), we can deduce most of transaction blocks receive only one input transaction and don't have any output transation.

Draw a subset of the graph [Ondine contribution]¶

In [27]:
draw_subset_graph(G, nodes, nb_nodes=5000, title="Plot of a subset of bitcoin transaction network")

The shape of the graph below seems to confirm the above observations. Namely, the majority of transaction blocks are linked only to another transaction block. The vast majority of transaction blocks appear to have, on average, a single input and output. Visually, it's also clear that a large number of transaction blocks have only one input and no output.

Draw the interactions for a specific node transaction [Hugo contribution]¶

In [28]:
hash_node_transaction = "aa51f19748493441d743a0c408201507b99948795f7e2387a103d9c3f415c56a"
plot_graph_for_specific_node(G, hash_node_transaction)

We focus on one transaction in particular. The choice of this transaction is completely arbitrary and the code can be parameterized. Knowing the hash of the transaction in question, it is possible, thanks to the previous function, to trace all the interactions (input and output) of a block of transactions.

In the graph above, we're interested in the interactions of the transaction whose hash is aa51f19748493441d743a0c408201507b99948795f7e2387a103d9c3f415c56a. We can see that this transaction has many more inputs than outputs. This transaction has many more inputs than outputs. Also, this transaction block is connected to 78 other transaction blocks (77 transactions inputs and 1 transaction output).

Let's now focus our analyze on the SEC ETF approval for bitcoin¶

The SEC's approval for the creation of Bitcoin Spot ETFs on US exchanges has important implications for the crypto sphere. It will offer a new way of investing in Bitcoin (BTC) for US investors, which could lead to increased demand and a higher Bitcoin price. Bitcoin Spot ETFs could also attract new investors, notably institutional investors.

So the aim of our study is to investigate the impact of this announcement on Bitcoin transactions.

To this end, we will analyze transactions that took place before and after the SEC's official announcement.

The SEC announcement took place on 10-01-2024 at 10:30pm French time, i.e. 4:30pm Wahshington time. Given that the timestamp used by bitcoin is UTC. The SEC's decision took place at 09:30pm.

Consequently, we will analyze the transactions that took place 2 hours before and after the SEC announcement. This will also enable us to work on lighter graphs (containing fewer nodes and vertices).

Statistic plots of the network [Hugo contribution]¶

In [29]:
plot_statistics_bitcoin_transaction(nodes_df)

These graphs show that the SEC announcement on bitcoin ETFs does not seem to have had any impact on the number of bitcoin transactions. We might have expected the number of transactions to explode after the SEC's announcement, but that's not what we're seeing. In fact, the number of transactions decreased after the SEC announcement. Overall, we're seeing around 3,000-4,000 transactions every 15 minutes before the SEC decision. The same phenomenon is also visible after the SEC decision. We note, however, that the maximum number of transactions before the SEC decision is double the maximum number of transactions after the SEC decision.

Regardint to the amounts involved in mining transaction blocks, we can observe that the fees for mining bitcoin transactions are overwhelmingly in the 2-digit range (i.e. between 0 dollars and 50 dollars). However, there are some transactions for which the mining fee is in 3 digits. For example, the maximum amount was 1750 dollars before the SEC announcement and around 1400 dollars after the SEC announcement.

Finally, the weights associated with each bitcoin transaction block are mostly low. However, some transaction blocks stand out for their very high weights. We can imagine that these transactions are important nodes in our graph (large number of degrees). This suggests that certain nodes play the role of pivot nodes in our graph, due to the large number of neighbors. These nodes must have a large number of inputs and outputs.

Lets build the two transaction bitcoin networks before and after the SEC decision¶

In [30]:
start_time = '2024-01-10 19:30:00'
sec_announcement_time = '2024-01-10 21:30:00'
end_time = '2024-01-10 23:30:00'

G_before, G_after = build_networks_SEC_announcement(G, start_time, sec_announcement_time, end_time)
In [31]:
print("#### Statistics before the SEC announcement ####\n")
compute_graph_statistics(G_before)
print("\n\n#### Statistics after the SEC announcement ####\n")
compute_graph_statistics(G_after)
#### Statistics before the SEC announcement ####

DiGraph with 45415 nodes and 6035 edges
Number of nodes 45415
Number of edges 6035
Average degree 0.26577122096223715
Density 2.9260935059919534e-06


#### Statistics after the SEC announcement ####

DiGraph with 30245 nodes and 5201 edges
Number of nodes 30245
Number of edges 5201
Average degree 0.3439246156389486
Density 5.685832159088557e-06

We can see that the statistics of the transaction graphs before and after the SEC decision on the use of bitcoin ETFs are very similar overall. However, the number of nodes in the graph after the SEC decision is lower than the graph before the SEC decision (-15,000 fewer nodes after the SEC decision). Consequently, the average number of neighbors is higher after the SEC decision.

Find the most important bitoin transactions [Ondine contribution]¶

To do so, we identifying the nodes with the highest degrees

In [32]:
df_top_nodes_degrees_before, nodes_highest_degree_before = nodes_with_highest_degree(G_before, n_node=3000)
df_top_nodes_degrees_after, nodes_highest_degree_after = nodes_with_highest_degree(G_after, n_node=3000)

Plots the most 3000 important nodes (highest degrees)¶

Before the SEC announcement¶
In [33]:
plot_node_with_highest_degrees(G_before, nodes_highest_degree_before, type_sec="(Before SEC announcement)")
After the SEC announcement¶
In [34]:
plot_node_with_highest_degrees(G_after, nodes_highest_degree_after, type_sec="(After SEC announcement)")

We note that the graph before the SEC decision has more nodes with a large number of degrees (yellow and green nodes) than the graph after the SEC decision. On the other hand, the graph after the SEC decision seems to have 2 very important nodes with more degrees (35 neighbors) than the most important nodes (25 neighbors) in the graph before the SEC decision.

Overall, the graphs confirm the statistics we calculated earlier, as they show very similar patterns. Consequently, the SEC's decision seems to have little impact on the dynamics of bitcoin transactions.

Let's focus now on the 10 most important nodes (highest degrees)¶

For obvious reasons of computational resource, we decided to concentrate our analysis on the 10 transactions bicoin nodes with the highest degree.

In [35]:
# Select the first 10 nodes with the highest degrees after and before the SEC decision
df_G_top10_before, nodes10_highest_degree_before = nodes_with_highest_degree(G_before, n_node=10)
df_G_top10_after, nodes10_highest_degree_after = nodes_with_highest_degree(G_after, n_node=10)

top10_nodes_before = df_G_top10_before["hash_transaction"].tolist()
top10_nodes_after = df_G_top10_after["hash_transaction"].tolist()

# Build the subgraph of the 1000 nodes with highest degrees after and before the SEC decision
G_top10_before = build_subgraph_top_nodes(G_before, top10_nodes_before)
G_top10_after = build_subgraph_top_nodes(G_after, top10_nodes_after)

Compute basic statistic for both network (Before and after the SEC announcement) [Ondine Contributions]¶

In [36]:
print("#### Statistics before the SEC announcement ####\n")
compute_graph_statistics(G_top10_before)
print("\n\n#### Statistics after the SEC announcement ####\n")
compute_graph_statistics(G_top10_after)
#### Statistics before the SEC announcement ####

DiGraph with 229 nodes and 219 edges
Number of nodes 229
Number of edges 219
Average degree 1.9126637554585153
Density 0.004194438060216042


#### Statistics after the SEC announcement ####

DiGraph with 222 nodes and 212 edges
Number of nodes 222
Number of edges 212
Average degree 1.90990990990991
Density 0.004321063144592557

By focusing our analysis on the interactions of the 10 largest transactions that took place before and after the SEC ruling. We note that these transactions show similar patterns. Indeed, the descriptive statistics of the two graphs (before and after the SEC decision) are very close. For example, the average number of degrees is identical. Most nodes have 2 neighbors. This suggests that most nodes have a single input and a single output.

In [37]:
plot_distribution_degrees(df_G_top10_before["degrees"], df_G_top10_after["degrees"]);

This graph shows that the graph after the SEC decision has more nodes with higher degrees (over 30 nodes) than the graph before the SEC decision.

This highlights the fact that the graph after the SEC decision has 2-3 nodes playing a pivotal (very important) role.

Plot the two network before and after the SEC announcement [Ondine Contribution]¶

In [38]:
plot_graph_before_after(G_top10_before, G_top10_after, top10_nodes_before, top10_nodes_after)

Plot the betweeness centrality distribution before and after the SEC announcement [Ondine Contribution]¶

Betweenness centrality can be useful in identifying critical nodes in a network, such as key connectors or bottlenecks, and can provide insights into how information or influence spreads throughout the network.

In [39]:
compute_betweenness_centrality(G_top10_before, G_top10_after)

We can see that de betweenness centrality of the both graph (before and after the SEC announcement) are very low. This means that no node in the graph acts as a bridge along the shortest path between pairs of other nodes in both networks. Hence, no nodes semmes to be crucial for maintaining communication and facilitating the flow of transaction between different parts of the network.

Compute degree distribution and degree centrality [Hugo contribution]¶

The degree centrality measures the number of direct connections a node has. Nodes with high degree centrality are highly connected to other nodes in the network and are often considered important for information dissemination.

$$ \frac{\text{Number of neighbours I have}}{\text{Number of neighbours I could possibly have}} $$

Degree Centrality of a node can range from 0 (if the node has no connections) to 1 (if the node is connected to every other node in the network)

In [40]:
compute_degree_distribution(G_top10_before, G_top10_after)

We can note that the distributions of the number of degrees and the distribution of degree centrality are very similar for both network (before and after the SEC announcement). In addition, we can see that the higher the number of degrees, the higher the degree of centrality.

Plot the transactions with the highest input total in dollars ($) [Hugo contribution]¶

In [41]:
circo_plot_graph(G_top10_before, G_top10_after, node_order='input_total_usd', node_color='input_total_usd')
arc_plot_graph(G_top10_before, G_top10_after, node_order='input_total_usd', node_color='input_total_usd')

Plot the transactions with the highest amount in dollars ($) [Hugo contribution]¶

In [42]:
transaction_fees_amount(G_top10_before, G_top10_after)

Plot bitcoin transaction graphs with node size based on the number of neighbors (degrees) and node color based on the dollar amount of the input. [Hugo contribution]¶

In [43]:
plot_transaction_graph_bis(G_top10_before, G_top10_after)

Plot Bitcoin Transaction Network with node size based on degrees and node color based on input total USD and edges size based on the value USD transaction [Hugo contribution]¶

In [44]:
plot_transaction_bitcoin_network(G, G_top10_before, G_top10_after)

Plot betweeness centrality of the bitcoin transactions network [Hugo contribution]¶

Betweenness is a measure of centrality based on shortest paths. The betweenness centrality for each node is the number of these shortest paths that pass through the node.

Before the SEC announcement

In [45]:
plot_betweenness_centrality_network(G_top10_before, type_sec="Before the SEC announcement")

After the SEC announcement

In [46]:
plot_betweenness_centrality_network(G_top10_after, type_sec="After the SEC announcement")

Page Rank [Hugo contribution]¶

PageRank is an algorithm used by Google Search to rank web pages in their search engine results. It is a way of measuring the importance of website pages. For example, PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is.

In [47]:
def apply_pagerank_and_plot_side_by_side(G_before, G_after):
    # Apply PageRank algorithm for both networks
    pagerank_scores_before = nx.pagerank(G_before)
    pagerank_scores_after = nx.pagerank(G_after)

    # Plotting the graphs side by side
    fig, axs = plt.subplots(1, 2, figsize=(15, 6))
    
    # Plot for Before SEC Announcement network
    pos_before = nx.circular_layout(G_before)
    node_sizes_before = [v * 1e4 for v in pagerank_scores_before.values()]  # Adjust multiplier as needed
    node_color_before = list(pagerank_scores_before.values())
    node_colorbar_before = axs[0].scatter([], [], c=[], cmap='coolwarm', edgecolor='none')
    nx.draw(G_before, pos_before, ax=axs[0], with_labels=False, node_color=node_color_before, cmap='coolwarm', edge_color='gray', linewidths=0.5, font_size=8, node_size=node_sizes_before)
    axs[0].set_title("Before SEC Announcement")
    
    # Plot for After SEC Announcement network
    pos_after = nx.circular_layout(G_after)
    node_sizes_after = [v * 1e4 for v in pagerank_scores_after.values()]  # Adjust multiplier as needed
    node_color_after = list(pagerank_scores_after.values())
    node_colorbar_after = axs[1].scatter([], [], c=[], cmap='coolwarm', edgecolor='none')
    nx.draw(G_after, pos_after, ax=axs[1], with_labels=False, node_color=node_color_after, cmap='coolwarm', edge_color='gray', linewidths=0.5, font_size=8, node_size=node_sizes_after)
    axs[1].set_title("After SEC Announcement")
    
    # Add color bar
    cbar_before = fig.colorbar(node_colorbar_before, ax=axs[0], orientation='horizontal', label='PageRank Score')
    cbar_after = fig.colorbar(node_colorbar_after, ax=axs[1], orientation='horizontal', label='PageRank Score')
    
    plt.suptitle("PageRank")
    
    plt.tight_layout()
    plt.show()

# Call the function with the network graphs
apply_pagerank_and_plot_side_by_side(G_top10_before, G_top10_after)

Conclusion¶

In comparing the state of the Bitcoin transaction network before and after the SEC's announcement of a Bitcoin ETF, several key insights emerge.

Firstly, the reduction in the number of nodes post-announcement suggests that while there may have been an anticipation or speculation leading to increased activity before the announcement, the actual event led to a consolidation rather than an expansion in transaction activity. Despite the lower number of nodes, the increase in average degree and density indicates that the transactions that did occur were more interconnected, possibly reflecting a more cautious or strategic approach by those transacting in Bitcoin immediately after the announcement.

Interestingly, the descriptive statistics, such as average degree and density, are relatively close before and after the announcement, suggesting that the overall structure and connectivity of the network remained stable. This stability points to the underlying robustness of the Bitcoin network and indicates that major news events may not have as dramatic an immediate impact as might be expected.

The circos and arc plots depicting the top transactions by total input in USD before and after the announcement demonstrate the concentration of transaction volume among a few significant nodes. These focal nodes could represent large institutional moves or aggregations of smaller transactions by intermediaries.

Finally, betweenness centrality metrics suggest that the network does not rely on a few nodes to facilitate transaction flows. A low betweenness centrality across the network indicates a distributed interconnectivity that does not bottleneck at particular nodes. This further underpins the decentralized nature of the Bitcoin transaction network.

Overall, the SEC's announcement appears to have had a more nuanced effect on the Bitcoin network than might be presumed. Instead of a surge in transaction activity, there seems to be a strategic shift in the way transactions are conducted, with the most significant transactions playing a pivotal role. These insights highlight the complex and multifaceted response of decentralized networks to regulatory and external stimuli.